robust accuracy
fb4c48608ce8825b558ccf07169a3421-Supplemental.pdf
In this section, we perform additional diagnostics that give us confidence that our models are not doing any form of gradient obfuscation or masking [3, 53]. First, we report in Table 4 the robust accuracy obtained by our strongest models against a diverse set of attacks. The cascade is composed as follows: AUTOPGD-CE, an untargeted attack using PGD with an adaptive step on the cross-entropy loss [10], AUTOPGD-T, a targeted attack using PGD with an adaptive step on the difference of logits ratio [10], FAB-T, a targeted attack which minimizes the norm of adversarial perturbations [9], SQUARE, a query-efficient black-box attack [1]. First, we observe that our combination of attacks, denoted AA+MT matches the final robust accuracy measured by AUTOATTACK. Second, we also notice that the black-box attack (i.e., SQUARE) does not find any additional adversarial examples.
Data Augmentation Can Improve Robustness
Adversarial training suffers from robust overfitting, a phenomenon where the robust test accuracy starts to decrease during training. In this paper, we focus on reducing robust overfitting by using common data augmentation schemes. We demonstrate that, contrary to previous findings, when combined with model weight averaging, data augmentation can significantly boost robust accuracy. Furthermore, we compare various data augmentations techniques and observe that spatial composition techniques work best for adversarial training. Finally, we evaluate our approach on CIFAR-10 against ` and `2 norm-bounded perturbations of size = 8/255 and = 128/255, respectively. We show large absolute improvements of +2.93% and +2.16% in robust accuracy compared to previous state-of-the-art methods. In particular, against ` norm-bounded perturbations of size = 8/255, our model reaches 60.07%
Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings
Research on adversarial robustness is primarily focused on image and text data. Yet, many scenarios in which lack of robustness can result in serious risks, such as fraud detection, medical diagnosis, or recommender systems often do not rely on images or text but instead on tabular data. Adversarial robustness in tabular data poses two serious challenges. First, tabular datasets often contain categorical features, and therefore cannot be tackled directly with existing optimization procedures. Second, in the tabular domain, algorithms that are not based on deep networks are widely used and offer great performance, but algorithms to enhance robustness are tailored to neural networks (e.g.
setup
The implementation of the following setup is written in JAX [6] and Haiku [35]. We use Residual Networks (ResNets) and Wide ResNets (WRNs) [31, 79]. This is consistent with prior work [30, 49, 60, 72, 82] which use diverse variants of these network families. Furthermore, we adopt the same architecture details as Gowal et al. [30] with Swish/SiLU [33] activation functions. Most of the experiments are conducted on a WRN-28-10 model which has a depth of 28, a width multiplier of 10 and contains 36M parameters. To evaluate the effect of using additional generated data on wider and deeper networks, we also run several experiments using WRN-70-16, which contains 267M parameters.